Adapting dependency parsing to spontaneous speech for open domain spoken language understanding
نویسندگان
چکیده
Parsing human-human conversations consists in automatically enriching text transcription with semantic structure information. We use in this paper a FrameNet-based approach to semantics that, without needing a full semantic parse of a message, goes further than a simple flat translation of a message into basic concepts. FrameNet-based semantic parsing may follow a syntactic parsing step, however spoken conversations in customer service telephone call centers present very specific characteristics such as non-canonical language, noisy messages (disfluencies, repetitions, truncated words or automatic speech transcription errors) and the presence of superfluous information. For syntactic parsing the traditional view based on context-free grammars is not suitable for processing non-canonical text. New approaches to parsing based on dependency structures and discriminative machine learning techniques are more adapted to process spontaneous speech for two main reasons: (a) they need less training data and (b) the annotation with syntactic dependencies of conversation transcripts is simpler than with syntactic constituents. Another advantage is that partial annotation can be performed. This paper presents the adaptation of a syntactic dependency parser to process very spontaneous speech recorded in a callcentre environment. This parser is used in order to produce FrameNet candidates for characterizing conversations between an operator and a caller.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملRobust dependency parsing for spoken language understanding of spontaneous speech
We describe in this paper a syntactic parser for spontaneous speech geared towards the identification of verbal subcategorization frames. The parser proceeds in two stages. The first stage is based on generic syntactic resources for French. The second stage is a reranker which is specially trained for a given application. The parser is evaluated on the MEDIA corpus.
متن کاملCross-Domain and Cross-Language Porting of Shallow Parsing
English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...
متن کاملUsing PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian
The automatic lemmatization and morpho-syntactic annotation of spoken language is a quite recent and complex task for Natural Language Processing. The state of the art on written corpora don’t provide us with a satisfactory level of analysis regarding spontaneous spoken language (Uchimoto et al., 2002; Moreno & Guirao, 2003). The spontaneous speech corpus Italian C-ORALROM has been tagged with ...
متن کاملStudy on Detection of Prosodic Phrase Boundaries in Spontaneous Speech
Prosodic information, which has the abilities of disambiguation, improving the parsing of the spoken language and predicting recognition errors, becomes more and more important in speech recognition and understanding, especially in spontaneous speech. In this paper, we investigate the detection of the phrase boundaries by prosodic features in the domain-specified Chinese spontaneous speech. The...
متن کامل